Calculating Similarity of Arbitrary Reports

نویسندگان

  • Veronika Thost
  • Alexander Schill
چکیده

Abstract Reporting is an essential part of business, today, and people spend a lot of time creating meaningful visualizations of their most important data. Surprisingly, the reuse of reports (i.e., applying the same visualization or query on di↵erent data) is not common. The recommendation of proven, existing queries represents one part of this reuse. Since there are several report formats and the reports target di↵erent data sources, the task of matching report queries for recommendation is very complex and has not been addressed, yet. Recent works that aim at query recommendation – usually to support the user in database add-ons – focus on collaborative approaches or target query completion. The idea behind this study is to follow a content-based, combined approach to match report queries. The use of both an abstract and a concrete representation of the queries allows the application of di↵erent, well-known matching techniques in parallel. The focus of this work is to evaluate the impact of similarity search and schema matching for matching queries. For that, query matching algorithms based on the two techniques are developed: an e cient, index-based comparison using similarity search and a fine-grained matching of the parse trees of the queries with schema matchers. Next to the presentation of an e↵ective combination of those algorithms, a major contribution of this thesis is the creation of an empirical data set of 150 queries with the corresponding similarity ratings for all 22,500 query pairs for a comprehensive evaluation.Reporting is an essential part of business, today, and people spend a lot of time creating meaningful visualizations of their most important data. Surprisingly, the reuse of reports (i.e., applying the same visualization or query on di↵erent data) is not common. The recommendation of proven, existing queries represents one part of this reuse. Since there are several report formats and the reports target di↵erent data sources, the task of matching report queries for recommendation is very complex and has not been addressed, yet. Recent works that aim at query recommendation – usually to support the user in database add-ons – focus on collaborative approaches or target query completion. The idea behind this study is to follow a content-based, combined approach to match report queries. The use of both an abstract and a concrete representation of the queries allows the application of di↵erent, well-known matching techniques in parallel. The focus of this work is to evaluate the impact of similarity search and schema matching for matching queries. For that, query matching algorithms based on the two techniques are developed: an e cient, index-based comparison using similarity search and a fine-grained matching of the parse trees of the queries with schema matchers. Next to the presentation of an e↵ective combination of those algorithms, a major contribution of this thesis is the creation of an empirical data set of 150 queries with the corresponding similarity ratings for all 22,500 query pairs for a comprehensive evaluation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Method for Calculating Propagation Modes of a One Dimensional Photonic Crystal (RESEARCH NOTE)

Photonic band-gap (PBG) crystals offer new dimensions of freedom in controlling propagation of electromagnetic waves. The existence of stop-bands in the transmission characteristic of these crystals makes them a suitable element for the realization of many useful microwave and optical subsystems. In this paper, we calculate the propagation constant of a one-dimensional (1-D) photonic crystal by...

متن کامل

TOPOLOGICAL SIMILARITY OF L-RELATIONS

$L$-fuzzy rough sets are extensions of the classical rough sets by relaxing theequivalence relations to $L$-relations. The topological structures induced by$L$-fuzzy rough sets have opened up the way for applications of topological factsand methods in granular computing. In this paper, we firstly prove thateach arbitrary $L$-relation can generate an Alexandrov $L$-topology.Based on this fact, w...

متن کامل

A computational method to analyze the similarity of biological sequences under uncertainty

In this paper, we propose a new method to analyze the difference and similarity of biological sequences, based on the fuzzy sets theory. Considering the sequence order and some chemical and structural properties, we present a computational method to cluster the biological sequences. By some examples, we show that the new method is relatively easy and we are able to compare the sequences of arbi...

متن کامل

New distance and similarity measures for hesitant fuzzy soft sets

The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...

متن کامل

A New Heuristic Algorithm for Drawing Binary Trees within Arbitrary Polygons Based on Center of Gravity

Graphs have enormous usage in software engineering, network and electrical engineering. In fact graphs drawing is a geometrically representation of information. Among graphs, trees are concentrated because of their ability in hierarchical extension as well as processing VLSI circuit. Many algorithms have been proposed for drawing binary trees within polygons. However these algorithms generate b...

متن کامل

Calculating Semantic Similarity between Facts

The present paper is devoted to the calculation of semantic similarity between facts. A fact is considered as a single sentence including three parts, “what happened”, “where” and “when”. We propose a function calculating the semantic similarity and provide some experimental results

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013